Picture for Jinqiao Wang

Jinqiao Wang

Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, objecteye.Inc

ReCALL: Recalibrating Capability Degradation for MLLM-based Composed Image Retrieval

Add code
Feb 02, 2026
Viaarxiv icon

Towards Governance-Oriented Low-Altitude Intelligence: A Management-Centric Multi-Modal Benchmark With Implicitly Coordinated Vision-Language Reasoning Framework

Add code
Jan 27, 2026
Viaarxiv icon

PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning

Add code
Jan 19, 2026
Viaarxiv icon

GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models

Add code
Jan 08, 2026
Viaarxiv icon

Dichotomous Diffusion Policy Optimization

Add code
Dec 31, 2025
Viaarxiv icon

Improving Generalization in LLM Structured Pruning via Function-Aware Neuron Grouping

Add code
Dec 28, 2025
Viaarxiv icon

ESearch-R1: Learning Cost-Aware MLLM Agents for Interactive Embodied Search via Reinforcement Learning

Add code
Dec 21, 2025
Viaarxiv icon

UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations

Add code
Dec 12, 2025
Figure 1 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Figure 2 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Figure 3 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Figure 4 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Viaarxiv icon

PixCLIP: Achieving Fine-grained Visual Language Understanding via Any-granularity Pixel-Text Alignment Learning

Add code
Nov 06, 2025
Viaarxiv icon

From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation

Add code
Oct 01, 2025
Figure 1 for From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation
Figure 2 for From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation
Figure 3 for From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation
Figure 4 for From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation
Viaarxiv icon